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Abstract 

We show tight bounds for both onhne integer multiphcation and convohition in 
the cell-probe model with word size w. For the multiplication problem, one pair of 
digits, each from one of two n digit numbers that are to be multiplied, is given as 
input at step i. The online algorithm outputs a single new digit from the product of 
the numbers before step i + 1. We give a logn) bound on average per output 
digit for this problem where 2* is the maximum value of a digit. In the convolution 
problem, we are given a fixed vector V of length n and we consider a stream in 
which numbers arrive one at a time. We output the inner product of V and the 
vector that consists of the last n numbers of the stream. We show a 6(:^logn) 
bound for the number of probes required per new number in the stream. All the 
bounds presented hold under randomisation and amortisation. Multiplication and 
convolution are central problems in the study of algorithms which also have the 
widest range of practical applications. 

1 Introduction 

We consider two related and fundamental problems: multiplying two integers and com- 
puting the convolution or cross-correlation of two vectors. We study both these problems 
in an online or streaming context and provide matching upper and lower bounds in the 
cell-probe model. The importance of these problems is hard to overstate with both 
the integer multiplication and convolution problems playing a central role in modern 
algorithms design and theory. 

For notational brevity, we write [q] to denote the set {0, . . . ,q — 1}, where g is a 
positive integer. 

Problem 1 (Online convolution). For a fixed vector V € [g]" of length n, we consider a 
stream in which numbers from \q\ arrive one at a time. For each arriving number, before 
the next number arrives, we output the inner product (modulo q) of V and the vector 
that consists of the last n numbers of the stream. 



* A preliminary version of this paper appeared in ICALP '11. 
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We show that there are instances of this problem such that any algorithm solving it 
will require Q{-^logn) amortised time on average per output, where 6 = log2 <? and w 
is the number of bits per cell in the cell-probe model. The result is formally stated in 
Theorem [3l 

Problem 2 (Online multiplication). Given two numbers X,Y G [g"], where q is the base 
and n is the number of digits per number, we want to output the n least significant digits 
of the product of X and Y , in base q. We must do this under the constraint that the ith 
digit of the product (starting from the lower-order end) is outputted before the {i + l)th 
digit, and when the ith digit is outputted, we only have access to the i least significant 
digits of X and Y, respectively. We can think of the digits of X and Y arriving online 
in pairs, one digit from each of X and Y . 

We show that there are instances of this problem such that any algorithm solving it 
takes logn) time on average per input pair, where 5 = log2 q and w is the number 
of bits per cell in the cell-probe model. The result is formally stated in Theorem [12] 

Our main technical innovation is to extend recently developed methods designed to 
give lower bounds on dynamic data structures to the seemingly distinct field of online 
algorithms. Where 5 = w, for example, we have r2(logn) lower bounds for both online 
multiplication and convolution, thereby matching the currently best known offline upper 
bounds in the RAM model. As we discuss in the Section 11.11 this may be the highest 
lower bound that can be formally proved for these problems without a further significant 
theoretical breakthrough. 

For the convolution problem, one consequence of our results is a new separation 
between the time complexity of exact and inexact string matching in a stream. The 
convolution has played a particularly important role in the field of combinatorial pat- 
tern matching where many of the fastest algorithms rely crucially for their speed on the 
use of fast Fourier transforms (FFTs) to perform repeated convolutions. These meth- 
ods have also been exten ded to allow searching for patterns in rapidly processed data 
streams CEPPlll . |CS11| . The results we present here therefore give the first s trict sep- 



aration between the constant time complexity of online exact matching GalSlI ] and any 
convolution based online pattern matching algorithm. 

Although we show only the existence of probability distributions on the inputs for 
which we can prove lower bounds on the exp ected running time of any deterministic al- 



gorithm, by Yao's minimax principle Yao77l | this also immediately implies that for every 



(randomised) algorithm, there is a worst-case input such that the (expected) running 
time is equally high. Therefore our lower bounds hold equally for randomised algorithms 
as for deterministic ones. 

The lower bounds we show for both online multiplication and convolution are also 
tight withi n the cell- probe model. This can be seen by application of reductions de- 



cation 



scribed in FS73I . |CEPP11| . I t was sho wn there that any offline algorithm for multipli- 



FS73l | or convolution |CEPPll| can be converted to an online one with at most 
an O(logn) factor overhead. For details of these reductions we refer the reader to the 
original papers. In our case, the same approach also allows us to directly convert any 
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cell-probe algorithm from an offline to online setting. An offline cell-probe algorithm 
for either multiplication or convolution could first read the whole input, then compute 
the answers and finally output them. This takes 0{^n) cell probes. We can therefore 
derive online cell-probe algorithms which take only logn) probes per output, hence 
matching the new lower bound we give. 

1.1 Previous results and upper bounds in the RAM model 

The best time complexity lower bounds for online multiplication of two n-bit numbers 
were given in the 1974 by Paterson, Fisch er and M eyer. They presented an Vlilogn) 
lower bound for multitape Turing machines PFM74I ] and also gave an Q.{\ogn/ log log n) 



lower bound for the 'bounded activity machine' (BAM). The BAM, which is a strict 
generalisation of the Turing machine model but which has nonetheless largely fallen out 
of favour, attempts to capture the idea that future states can only depend on a limited 
part of the current configuration. To the authors' knowledge, there has been no progress 
on cell-probe lower bounds for online multiplication or convolution previous to the work 
we present here. 

There have however been atter npts to provide offline lower bounds for the related 



problem of computing the FFT. In |Mor73l | Morgenstern gave an O(nlogn) lower bound 
conditional on the assumption that the underlying field of the transform is the complex 
numbers and that the modulus of any complex numbers involved in the computation is 
at most 1. Papadimitriou gave the same S7(nlogn) lower bound for FFTs of length a 
power of two, this time excluding certain classes of algorith ms inclu ding those that rely 



on linear mathematical relations among the roots of unity |PaD79l |. This work had the 



advantage of giving a conditional lower bound for FFTs over more gener al algeb ras than 



was previously possible, including for example finite fields. In 1986 Pan [Pan86f | showed 
that another class of algorithms having a so-called synchronous structure must require 
O(nlogn) time for the computation of both the FFT and convolution. 

The fastest known algorithms for both offline integer multiplication and convolution 
in the word-RAM model require 0{n log n) time by a well known application of a constant 
number of FFTs. As a consequence our online lower bounds match the best known time 
upper bounds for the offline problem. As we discussed above, our lower bounds are also 
tight within the cell-probe model for the online problems. The question now naturally 
arises as to whether one can find higher lower bounds in the RAM model. This appears 
as an interesting question as there remains a gap between the best known time upper 
bounds provided by existing algorithms and the lower bounds that we give within the 
cell-probe model. However, as we mention above, any offline algorithm for convolution 
or multip licatiq i i can be converted to an online one with at most an O(logn) factor 



overhead FS73I . |CEPPll[. As a consequence, it is likely to be hard to prove a higher 
lower bound for the online problem than we have given, at least for the case where 
5/w € 0(1), as this would immediately imply a super linear lower bound for offline 
convolution or multiplication. Such superlinear lower bounds are not yet known for any 
problem in NP except in very restricted models of computation, such as for example 
a single tape Turing Machine. Our only alternative route to find tight time bounds 



3 



would be to find better upper bounds for the online problems. For the case of online 
multiplication at least, this has been an open problem since at least 1973 and has so far 
resisted our best attempts. 



1.2 The cell-probe model 

When stating lower bounds it is important to be precise about the model in which the 
bounds apply. Our bounds in this paper hold in perhaps the strongest model of them 



all, the cell-probe model, introduced originally by Minsky and P apert MP69I. iMPSSlI 
in a different context and then subsequently by Fredman FreTSi ] and Yao Yao8ll |. In 



this model, there is a separation between the computing unit and the memory, which 
is external and consists of a set of cells of w bits each. The computing unit cannot 
remember any information between operations. Computation is free and the cost is 
measured only in the number of cell reads or writes (cell-probes). This general view 
makes the model very strong, subsuming for instance the popular word-RAM model. 
In the word-RAM model certain operations on words, such as additio n, subtraction 
and possibly multiplication take constant time (see for example HagOSl ] for a detailed 



introduction). Here a word corresponds to a cell. Typically we think of the cell size w 
as being at least log2 n bits, where n is the number of cells. This allows each cell to hold 
the address of any location in memory. 

The generality of the cell-probe model makes it particularly attractive for establishing 
lower bounds for data structure problems and many such results have been given in the 
past couple of decades. The approaches taken have until recently mainly been based 
on communication complexity arguments and the chronogram technique of Fredman 
and Saks ^FS89i] . There remains however, a number of unsatisfying gaps between the 
lower bounds and known upper bounds. Only a few years ago, a breakthrough lead 
by Demaii ie and Patra§cu gave us the tools to seal the gaps for several data structure 
problems |P D06| . The new technique was based on information theoretic arguments. 
Demaine and Patra§cu also presented ideas which allowed them to express more refined 
lower bounds such as trade-offs between updates and queries of dynamic data structures. 
For a list of data structure proble ms and their lower bounds using these and related 
techniques, see for example PatOSi ]. 



1.3 Organisation 

We present the new cell-probe lower bound for online convolution in Section [2] along 
with the main techniques that we will use throughout. In Section [3] we show how these 
can then be applied to the problem of online multiplication. 

2 Online convolution 

For a vector V of length n and i E [n], we write V[i] to denote the elements of V. For 
positive integers n and g, the inner product of two vectors U,V & [g]", denoted {U,V), 
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is defined as 

{U,V)=Y,{U[^-V[i]). 

ie[n] 

Parameterised by two positive integers n and q, and a fixed vector V G [g]" , the online 
convolution problem asks to maintain a vector U G [q]" subject to an operation next(A), 
which takes a parameter A G [g], modifies U to be the vector U[2], . . . , U[n — 1], A) 

and then returns the inner product {U,V). In other words, next(A) modifies U by 
shifting all elements one step to the left, pushing the leftmost element out, and setting 
the new rightmost element to A. We consider the online convolution problem over the 
ring Z/gZ, that is integer arithmetic modulo q. Let 5 = log2 q- 

Theorem 3. For any positive integers q and n, in the cell probe model with w bits 
per cell there exist instances of the online convolution problem such that the expected 
amortised time per next -operation is Q logn), where 5 = log2 q- 

In order to prove Theorem [3] we will consider a random instance that is described 
by n next-operations on the sequence A = (Aq, . . . , A„_i), where each Aj is chosen 
independently and uniformly at random from [q]. We defer the choice of the fixed vector 
V until later. For t from to n — 1, we use t to denote the time, and we say that the 
operation next(At) occurs at time t. 

We may assume that prior to the first update, the vector U = {0}", although any 
values are possible since they do not influence the analysis. To avoid technicalities we 
will from now on assume that n is a power of two. 



2.1 Information transfer 



Following the overall approach of Demaine and Patra§cu PD04i | we will consider adjacent 
time intervals and study the information that is transferred from the operations in one 
interval to the next interval. More precisely, let to,ti,t2 € [n] such that to ^ ti < t2 and 
consider any algorithm solving the online convolution problem. We would like to keep 
track of the memory cells that are written to during the time interval [toj^i] and then 
read during the succeeding interval i + 1 , ^2] • The information from the next-operations 
taking place in the interval [to, ti] that the algorithm passes on to the interval [ti + l,t2] 
must be contained in these cells. Informally one can say that there is no other way for the 
algorithm to determine what occurred during the interval [toj^i] except through these 
cells. Formally, the information transfer, denoted /T(to, ii, ^2); is defined to be the set 
of memory cells c such that c is written during [to, ti], read at a time tr G [ti + 1, t2] and 
not written during [ti -|- 1, t^]. Hence a cell that is overwritten in [ti -|- 1, t2] before being 
read is not included in the information transfer. Observe that the information transfer 
depends on the algorithm, the vector V and the sequence A. The first aim is to show 
that for any choice of algorithm solving the online convolution problem, the number of 
cells in the information transfer is bounded from below by a sufficiently large number 
for some choice of the vector V. 

For ^ to ^ ti < n, we write Ajjp^^j] to denote the subsequence (Atg, . . . , A^j) of 
A, and A^f^-^^f^^c to denote the sequence (Aq, . . . , Af^-i, Aj^+i, . . . , A„_i) which contains 
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all the elements of A except for those in A^^^^^^y For i G [n], we let Pt € [q] denote the 
inner product returned by next(A() at time t (recall that we operate modulo q). We let 

^[ti+iM = (Pti+i, - ■ ■ jPti)- 

Since A is a random variable, so is t^]- In particular, if we condition on a fixed 

choice of Ajij, fj]c, call it A|^^ ^^j^ , then is a random variable that depends on 

the random values in Aj^^ ^^j. The dependency on the next-operations in the interval 
[io,ii] is captured by the information transfer IT{tQ,ti,t2), which must encode all the 
relevant information in order for the algorithm to correctly output the inner products 
in [ti + l,t2]. In other words, an encoding of the information supplied by cells in the 
information transfer is an upper bound on the conditional entropy of P[ti+i,t2]- This fact 
is stated in Lemma [Hand was given in PatOSl ] with small notational differences. 



Lemma 4 (Lemma 3.2 of PatOSl ]). The entropy 

fix 



fix 



w + 2wE 



\IT{to,t,,t2)\ I A[,„,i^]c = Ag_,^ 



Proof. The average length of any encoding of Pfj^+i j^] (conditioned on Aj^^^^^jc) is an 
upper bound on its entropy. We use the information transfer as an encoding in the 
following way. For every cell c in the information transfer IT{tQ,ti,t2), we store the 
address of c, which takes at most w bits under the assumption that the cell size can hold 
the address of every cell, and we store the contents of c, which is a cell of w bits. In total 
this requires 2w ■ \IT(to,ti,t2)\ bits. In addition, we store the size of the information 
transfer, \IT{tQ,ti,t2)\, so that any algorithm decoding the stored information knows 
how many cells are stored and hence when to stop checking for stored cells. Storing 
the size of the information transfer requires w bits, thus the average total length of the 
encoding is w + 2wE[\IT{to,ti,t2)\ \ A[j„^t^]c = Af-^^^j,]. 

In order to prove that the described encoding is valid, we describe how to decode 
the stored information. We do this by simulating the algorithm. First we simulate 
the algorithm from time to to ~ 1- We have no problem doing so since all necessary 
information is available in Aj^^^^jc, which we know. We then skip from time to to 
ti and resume simulating the algorithm from time ti + 1 to t2- In this interval, the 
algorithm outputs the values in P[ti+i,t2]- order to correctly do so, the algorithm 
might need information from the next-operations in [to,ti]. This information is only 
available through the encoding described above. When simulating the algorithm, for 
each cell c we read, we check if the address of c is contained in the list of addresses that 
was stored. If so, we obtain the contents of c by reading its stored value. Each time we 
write to a cell whose address is in the list of stored addresses, we remove it from the 
stored list, or blank it out. Note that every cell we read whose address is not in the 
stored list contains a value that was written last either before time to or after time ti. 
Hence its value is known to us. □ 
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2.2 Recovering information 



In the previous section, we provided an upper bound for the entropy of the outputs from 
the next-operations in [ti + 1,^2]- Next we will explore how much information needs 
to be communicated from [to,ti] to [ti + l,t2]- This will provide a lower bound on the 
entropy. As wc will sec, the lower bound can be expressed as a function of the length of 
the intervals and the vector V. 

Suppose that [to) ^i] and [ti + 1, t2] both have the same length i. That is, ti — to + 1 = 
t2 — ti= I. For z G [^] , the output at time t\ + l + i can be broken into two sums Si and 
S[, such that Ptj+i+j = Si + S'^, where 



S^ = ^^V[n-l-{i + z)+j]-At,+j] 



is the contribution from the alignment of V with Aj^^ j^j, and S'^ is the contribution from 
the alignments that do not include Ajj^ j^j. 

We define Mv,e to be the £x£ matrix with entries Mv,e{i,j) = V[n — 1 — (i + i) 
That is, 



Mvi = 



(V\n 
V\n-l- 2] 
V\n-t- 3] 



V{n-i- 1] 
V\n-i- 2] 



y[n-£ + 1] 
V\n-l^ 0] 
V\n-t- 1] 



V V\n-2l\ V[n-2i + l] V[n-2i + 2] 



V[n-2] \ 

V [n - 3] 

V [n - 4] 

V[n-i-l]J 



We observe that My/ is a Toeplitz matrix (or "upside down" Hankel matrix) since it is 
constant on each descending diagonal from left to right. This property will be important 
later. Prom the definitions above it follows that 



to+l 



A 



V At, J 



( So \ 

Si 

\Se-iJ 



(1) 



We define the recovery number Ry/ to be the number of variables x G {xi, . . . , xe} 
such that X can be determined uniquely by the system of linear equations 



Ml 



VI 




(yi 



\yi/ 



where we operate in "L/qL. The recovery number may be distinct from the rank of 
a matrix, even where we operate over a field. As an example, consider the all ones 
matrix. The matrix will have recovery number zero but rank one. The recovery number 
is however related to the conditional entropy of -P[ti+i,t2] as described by the next lemma. 
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Lemma 5. If the intervals [tQ,ti] and [ti + 1,^2] both have the same length i, then the 
entropy 



Proof. As described above, for i G [£], P^j+i+j = Si + S'^, where S'^ is a constant that 
only depends on V and ^^j^- Hence we can compute the values Sq, . . . , Se^i from 
P[ti+i,t2]- From Equation ([1]) it follows that Sq, . . . , Si^i uniquely specify Ry/ of the 
parameters in Aj^^ ^^j. That is, we can recover Ry/ of the parameters from the interval 
[io,ii]- Each of these parameters is a random variable that is uniformly distributed in 
[q], so it contributes 6 bits of entropy. □ 

We now combine Lemmas H] and [S] in the following corollary. 

Corollary 6. For any fixed vector V, two intervals [tQ,ti] and [ti + l,t2] of the same 
length i, and any algorithm solving the online convolution problem on A chosen uni- 
formly at random from [g]", 

E[|/r(to,ti,t2)|]^^-^. 

Proof. For A^f^^^f^^c fixed to Ajj.^ ^^j^, comparing Lemmas H] and [5l we see that 



E 



\into,t,,t2)\ I A[i„,,,]c = Af^^,^ 



5Rve 1 



2w 2 

The result follows by taking expectation over Aj^^^^^jc under the random sequence A. □ 
2.3 The lower bound for online convolution 

We now show how a lower bound on the total number of cell reads over n next-operations 
can be obtained by summing the information transfer between many p airs of time in- 



tervals. We again follow the approach of Demaine and Patra§cu [PDOJ], which involves 
conceptually constructing a balanced tree over the time axis. This lower hound tree., 
denoted T, is a balanced binary tree on n leaves. Recall that we have assumed that n 
is a power of two. The leaves, from left to right, represent the time t from to n — 1, 
respectively. An internal node v is associated with the times to) and t2 such that 
the two intervals [t^^ti] and [ti + l,t2] span the left subtree and the right subtree of 
V, respectively. For example, in Figure [U the node labelled v is associated with the 
intervals [16,23] and [24,31]. 

For an internal node v of T, we write IT{v) to denote /T(to, ^i, ^2); where t^, ti, t2 
are associated with v. We write L{v) to denote the number of leaves in the left (same as 
the right) subtree of v. The key lemma, stated next, is a modified version of Theorem 3.6 
in PatOSi ]. The statement of the lemma is adapted to our online convolution problem 



and the proof relies on Corollary O 
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V 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 



Figure 1: A lower bound tree T over n = 32 operations. 

Lemma 7. For any fixed vector V and any algorithm solving the online convolution 
problem, the expected running time of the algorithm over a sequence A that is chosen 
uniformly at random from [g]" is at least 

where the sum is over the internal nodes of T ■ 

Proof. We first consider a fixed sequence A. We argue that the number of read in- 
structions executed by the algorithm is at least YlveT ^^^i^! read 
instruction, let tr be the time it is executed. Let ^ tr be the time the cell was last 
written, ignoring tr = tw Then this read instruction (the cell it acts upon), is contained 
in IT{v), where v is the lowest common ancestor of t^ and tr- Thus, ^y^j- \ IT(v) \ never 
double-counts a read instruction. 

For a random A, an expected lower bound on the number of read instructions is 
therefore IE[^^,g^ |/T(?;)|]. Using linearity of expectation and Corollary El we obtain 
the lower bound in the statement of the lemma. □ 



2.3.1 Lower bound with a random vector V 

We have seen in Lemma [7] that a lower bound is highly dependent on the recovery 
numbers of the vector V. In the next lemma, we show that a random vector V has 
recovery numbers that are large. 

Lemma 8. Suppose that q is a prime and the vector V is chosen uniformly at random 
from [g]". Then K[Rv/] i/2 for every length £. 



Proof. Recall that My/ is an £x£ Toeplitz matrix. It has been shown in |KL96l | that for 
any I, out of all the ixi Toeplitz matrices over a finite field of q elements, a fra ction of 
exactly (1 — 1/q) is non-singular. This fact was actu ally alr eady established in 



almost 40 years earlier but incidentally reproved in jKL96l | . Since we have assumed in 
the statement of the lemma that g is a prime, the ring TLjqL we operate in is indeed 
a finite field. The diagonals of My/ are independent and uniformly distributed in [g], 
hence the probability that My^ is invertible is (1 — 1/g) ^ 1/2. If My^ is invertible 
then the recovery number Ry^i = i; there is a unique solution to the system of linear 
equations in Equation ([T]). On the other hand, if My/ is not invertible then the recovery 
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number will be lower. Thus, we can safely say that the expected recovery number Ry^ 
is at least £/2, which proves the lemma. □ 



Before we give a lower bound for a random choice of V in Theorem [TU] below, we 
state the following fact. 

Fact 9. For a balanced binary tree with n leaves, the sum of the number of leaves in the 
subtree rooted at v, taken over all internal nodes v, is nlog2n. 

Theorem 10. Suppose that q is a prime. In the cell-probe model with w bits per cell, 
any algorithm solving the online convolution problem on a vector V and A, both chosen 
uniformly at random from [g]", will run in (^nlogn) time in expectation, where 5 = 
log2 q- 

Proof. For a random vector V ^ a lower bound is obtained by taking the expectation of 
the bound in the statement of Lemma [71 Using linearity of expectation and applying 
Lemma [8] and Fact [9] completes the proof. □ 

Remark. Theorem [10] requires that g is a prime but for an integer 5 > \, q = 2^ 
is not a prime. However, we know that there is always at least one prime p such that 
2^-1 < p < 2''. Thus, Theorem[TU|is applicable for any integer (5, only with an adjustment 
by at most one. 



2.3.2 Lovi^er bound w^ith a fixed vector V 

We demonstrate next that it is possible to design a fixed vector V with guaranteed large 
recovery numbers. We will use this vector in the proof of Theorem [3l The idea is to 
let V consist of stretches of Os interspersed by Is. The distance between two succeeding 
Is is an increasing power of two, ensuring that for half of the alignments in the interval 
[ti + l,t2]) all but exactly one element of Aj^^ are simultaneously aligned with a in 
V . We define the binary vector Kn G [2]" to be 

Kn = {... 0000000000000100000000000000010000000100010110) , 

or formally, 

I 1, if n — 1 — z is a power of two; , , 

^n[i]= ' . (2) 

I 0, otherwise. 

Lemma 11. Suppose V = and I ^ 1 is a power of two. The recovery number 

Rv/ ^ 

Proof Recah that entry Mvi{i,j) = V[n - I - + i) + j]. Thus, Mv/{i,j) = 1 if and 
only if n — 1 — (n — 1 — {i + i) + j) = i + i — j is a power of two. It follows that for row 
i = £/2, — 1, My^£{i,j) = 1 for j = i and Mv/{i,j) = for j / i. This implies that 
the recovery number Ry/ is at least i/2. □ 

We finally give the proof of Theorem [3l 
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Theoreml^ We assume that n is a power of two. Let V = Kn- It follows from Lemma [TT] 
and Fact[9]that Ylv&T ^v,L{v) ^ Ylv&T /'^ ~ ^ {nlogn). Note that L{v) is a power 
of two for every node v in T. For A chosen uniformly at random from [g]", apply 
Lemma [7] to obtain the expected running time (^nlogn) over n next-operations. □ 

3 Online multiplication 

In this section we consider online multiplication of two n-digit numbers in base q ^ 2. 
For a non- negative integer X, let X[i] denote the ith digit of X in base q, where the 
positions are numbered starting with at the right (lower-order) end. We think of X 
padded with zeros to make sure that X[i] is defined for arbitrarily large i. For j ^ i, we 
write X[i . . j] to denote the integer that is written X[j] ■ ■ ■ X[i] in base q. For example, 
let X = 15949 (decimal representation) and q = 8 (octal): 

X = 37115 (base 8) X[0] = 5 

X[l . . 3] = 711 (base 8) = 457 (decimal) X[3] = 7 

X[3 . . 10] = 37 (base 8) = 31 (decimal) X[15] = 

The online multiplication problem is defined as follows. The input is two n-digit 
numbers X,Y G [q^] in base q (higher order digits may be zero). Let Z = X x Y . 
We want to output the n lower order digits of Z in base q (i.e. Z[0],...,Z[n — 1]) 
under the constraint that Z[i] must be outputted before Z[i + 1] and when Z[i] is 
outputted, we are not allowed to use any knowledge of the digits X[i + 1], . . . , X[n — 1] 
and y[i + 1], . . . , y[n — 1]. We can think of the digits of X and Y arriving one pair at a 
time, starting with the least significant pair of digits, and we output the corresponding 
digit of the product of the two numbers seen so far. 

We also consider a variant of the online multiplication problem when one of the two 
input numbers, say Y, is known in advance. That is, all its digits are available at every 
stage of the algorithm and only the digits of X arrive in an online fashion. In particular 
we will consider the case when Y = Kq^n is fixed, where we define Kq^n to be the largest 
number in [g"] such that the ith bit in the binary expansion of Kq^n is 1 if and only if 
z is a power of two (starting with i = at the lower-order end). We can see that the 
binary expansion of Kq^n is the reverse of log^ q) in Equation ([2]) . We will prove the 
following result. 

Theorem 12. For any positive integers 6 and n in the cell probe model with w bits per 
cell, the expected running time of any algorithm solving the online multiplication problem 
on two n-digit random numbers X,Y [q'"'] is Q{^nlogn), where q = 2^ is the base. 
The same bound holds even under full access to every digit of Y, and when Y = Kq^n is 
fixed. 

It suffices to prove the lower bound for the case when we have full access to every 
digit of y; we could always ignore digits. We prove Theorem 1 1 2 1 using the same approach 
as for the online convolution problem. Here the next-operation delivers a new digit of 
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Y' 



= X 



Y 



to+l 



= z 



Figure 2: X, Y and Z = X x y in base q. 



X, which is chosen uniformly at random from [q\, and outputs the corresponding digit 
of the product of X and Y . 

For toi ^15^2 £ [n] such that to ^ < we write X[to, ti]'^ to denote every digit of X 
(in base q) except for those at position to through ti. It is helpful to think of X[to,ti]'^ 
as a vector of digits rather than a single number. We write X^^\fQ,tif to denote a 
fixed choice of X[tQ,tif. During the interval [ti + 1, ^2], we output Z[{ti + 1) . . t2]- The 
information transfer is defined as before, and Lemma H] is replaced with the following 
lemma. 

Lemma 13. The entropy 

H{z[{ti + i)..t2] I x[to,tir = xfi^[to,tin ^ 

«; + 2u;-E[|/r(to,ti,t2)| I X[to, ti]' = ^^^[to, ti]^ 
3.1 Retrorse numbers and the lower bound 

In Figure [21 the three numbers X, Y and Z = X xY are illustrated with some segments 
of their digits labelled X' , Y' and Z' . Informally, we say that Y' is retrorse if Z ' depend s 



heavily on X' . We have borrowed the term from Paterson, Fischer and Meyer PFM7J], 
however, we give it a more precise meaning, formalised below. 

Suppose [to, ti] and [ti + 1, t2] both have the same length £. For notational brevity, we 
write X' to denote X[to . . ti], Y' to denote Y[0 . . {2i-l)] and Z' to denote Z[(ti + 1) . . t2] 
(see Figure [2|). We say that Y' is retrorse if for any fixed values of to, X[to,ti]'^ (the 
digits of X outside [to,ti]) and Y[2£ . . (n — 1)], each value of Z' can arise from at most 
four different values of X' . That is to say there is at most a four-to-one mapping from 
possible values of X' to possible values of Z'. We define ly/ = i \i Y' is retrorse, 
otherwise ly^t = 0. Note that ly/ only depends on Y and I. We will use ly^^ similarly to 
the recovery number Ry/ from Section [2.21 and replace Lemma [5] with Lemma [T^ below, 
which combined with Lemma [13] gives us Corollary [151 
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Lemma 14. If the intervals [to,ti] and [ti + l,t2] both have the same length i, then the 
entropy 

H{z[{ti + i)..t2] I x[to,tir = xfi-[to,tir) ^ 

Proof. The lemma is trivially true when ly/ = 0, so suppose that ly/ = i- Then 
Y[0 . . {2£ — 1)] is retrorse, which implies that at most four distinct values of X[to . .ti] 
yield the same value of Z[{ti + 1) . . t2]- There are possible values of X[to . . ti], each 
with the same probability, hence, from the definition of entropy, 

H{z[iti + i)..t2] I x[to,tir = x''^[to,tir) ^ 

Corollary 15. For any fixed number Y, two intervals [toi^i] o-i^d \ti + 1,^2] of the 
same length i, and any algorithm solving the online multiplication problem on X chosen 
uniformly at random from [q"] , 

E[[/T(to,ti,t2)|] ^ ^-1. 

We take the same approach as in Section 12.31 and use a lower-bound tree T with n 
leaves to obtain the next lemma. The proof is identical to the proof of Lemma [71 only 
that we use Corollarv 1151 instead of Corollary [6l 

To avoid technicalities we will assume that n and 5 are both powers of two and we 
let the base q = 2^ . 

Lemma 16. For any fixed number Y and any algorithm solving the online multiplication 
problem, the expected running time of the algorithm with the number X chosen uniformly 
at random from [q^] is at least 



Wmv) - - 1) 



Before giving the proof of Theorem 1121 we bound the value of ly i for both a random 
number Y and Y = Kq^n- In order to do so, we will use the following two results by 



Paterson, Fischer and Meyer PFM74i | which apply to binary numbers. T he lemrn as 



are stated in our notation, but the translation from the original notation of PFM74I ] is 
straightforward. 



Lemma 17 (Lemma 1 of [PFMTj]). For the base q = 2 and fixed values oftQ, £, n and 
X[to,tiY (where ti = to + i — 1), such that i is a power of two, each value of Z' can 
arise from at most two values of X' when Y = i^2,n- 



Lemma 18 (Corollary of Lemma 5 in PFM74j ). For the base q = 2 and fixed values 



of to, £, n and X[to,tiY (where ti = to + i — 1), such that £ is a power of two, at least 
half of all possible values of Y' have the property that each value of Z' can arise from at 
most four different values of X' . 
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Lemma 19. If i is a power of two, then for a random Y € [q'^], E[Iy/] ^ i/2, and for 
Y = K,^n, lY,e = i- 

Proof. Suppose first that Y = Kq^n- Let £ he a power of two and to a non- negative 
integer. We define X' , Y' and Z' as before (see Figure[2]). Instead of writing the numbers 
in base we consider their binary expansions, in which each digit is represented by 
5 = log2 q bits. In binary, we can write X, Y and Z as in Figure [2] if we replace n, to and 
I with (5n, 5tQ and 5i, respectively. Note that 51 is a power of two. Since i^g,n = K2,&n-> 
it follows immediately from Lemma [T7] that Y' is retrorse and hence ly/ = i- 

Suppose now that Y is chosen uniformly at random from [g"], hence Y' is a random 
number in [q"^^]- From Lemma fTHl it follows that Y' is retrorse with probability at least 
a half. Thus, E[Iy/] ^£/2. □ 

Proof of Theorem We assume that n is a power of two. Let y be a random number 
in [g"], either under the uniform distribution or the distribution in which Kq^n has 
probability one and every other number has probability zero. A lower bound on the 
running time is obtained by taking the expectation of the bound in the statement of 
Lemma [T6l Using linearity of expectation and applying Lemma [19] and Fact [9] finish the 
proof. Note from Lemma [TU] that the expected value E[/y^^] = £. when Y = Kq^n- D 

4 Acknowledgements 

We are grateful to Mihai Patra§cu for suggesting the connection between online lower 
bounds and the recent cell-probe results for dynamic data structures and for very helpful 
discussions on the topic. We would also like to thank Kasper Green Larsen for the 
observation that our lower bounds are in fact tight within the cell-probe model. MJ was 
supported by the EPSRC. 

References 

[CEPPll] R. Chfford, K. Efremenko, B. Porat, and E. Porat. "A Black Box for Online 
Approximate Pattern Matching". In: Information and Computation 209.4 
(2011), pp. 731-736. 

[CSll] R. Clifford and B. Sach. "Pattern Matching in Pseudo Real-Time" . In: Jour- 
nal of Discrete Algorithms 9.1 (2011), pp. 67-81. 

[Day60] D. E. Daykin. "Distribution of bordered persymmetric matrices in a finite 
field". In: Journal fiir die reine und angewandte Mathematik 203 (1960), 
pp. 47-54. 

[Fre78] M. Fredman. "Observations on the complexity of generating Quasi-Gray 
codes". In: SIAM Journal on Computing 7.2 (1978), pp. 134-146. 

[FS73] M. J. Fischer and L. J. Stockmeyer. "Fast On-Line Integer Multiplication". 

In: STOC '79: Proc. 5^^ Annual ACM Symp. Theory of Computing, pp. 67- 
72. 



14 



[FS89] M. Fredman and M. Saks. "The cell probe complexity of dynamic data struc- 
tures". In: STOC '89: Proc. 21^^ Annual ACM Symp. Theory of Computing, 
pp. 345-354. 

[Gal81] Z. Galil. "String Matching in Real Time." In: Journal of the ACM 28.1 
(1981), pp. 134-149. 

[Hag98] T. Hagcrup. "Sorting and searching on the word RAM". In: STACS '98: 
Proc. 1^^ Annual Symp. on Theoretical Aspects of Computer Science, pp. 366- 
398. 

[KL96] E. Kaltofen and A. Lobo. "On rank properties of Toeplitz matrices over 
finite fields". In: ISSAC '96: 1996 International Symp. on Symbolic and 
Algebraic computation, pp. 241-249. 

[Mor73] J. Morgenstern. "Note on a Lower Bound on the Linear Complexity of the 
Fast Fourier Transform". In: Journal of the ACM 20.2 (1973), pp. 305-306. 

[MP69] M. Minsky and S. Papert. Perceptrons: An Introduction to Computational 
Geometry. MIT Press, 1969. 

[MP88] M. Minsky and S. Papert. Perceptrons: An Introduction to Computational 
Geometry. MIT Press, 1988. 

[Pan86] V. Y. Pan. "The trade-off between the additive complexity and the asyn- 
chronicity of linear and bilinear algorithms" . In: Information Processing Let- 
ters 22.1 (1986), pp. 11 -14. 

[Pap 79] C. H. Papadimitriou. "Optimality of the Fast Fourier transform" . In: Journal 
of the ACM 26 (1 1979), pp. 95-102. 

[PD04] M. Patra§cu and E. D. Deniainc. "Tight bounds for the partial-sums prob- 
lem". In: SODA '04: Proc. 1^^ ACM-SIAM Symp. on Discrete Algorithms, 
pp. 20-29. 

[PD06] M. Patra§cu and E. D. Demaine. "Logarithmic Lower Bounds in the Cell- 
Probe Model". In: SIAM Journal on Computing 35.4 (2006), pp. 932-963. 

[PFM74] M. S. Paterson, M. J. Fischer, and A. R. Meyer. "An Improved Overlap 
Argument for On-Line Multiplication" . In: SIAM-AMS Proceedings. Vol. 7. 
Amer. Math. Soc, pp. 97-111. 

[Pat08] M. Patra§cu. "Lower bound techniques for data structures". PhD thesis. 
MIT, 2008. 

[Yao77] A. C.-C. Yao. "Probabilistic computations: Toward a unified measure of 
complexity". In: FOGS '77: Proc. 18^^ Annual Symp. Foundations of Com- 
puter Science, pp. 222-227. 

[Yao81] A. C.-C. Yao. "Should Tables Be Sorted?" In: Journal of the ACM 28.3 
(1981), pp. 615-628. 



15 



